AITopics | humanoid control

Collaborating Authors

humanoid control

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SONIC: Supersizing Motion Tracking for Natural Humanoid Whole-Body Control

Luo, Zhengyi, Yuan, Ye, Wang, Tingwu, Li, Chenran, Chen, Sirui, Castañeda, Fernando, Cao, Zi-Ang, Li, Jiefeng, Minor, David, Ben, Qingwei, Da, Xingye, Ding, Runyu, Hogg, Cyrus, Song, Lina, Lim, Edy, Jeong, Eugene, He, Tairan, Xue, Haoru, Xiao, Wenli, Wang, Zi, Yuen, Simon, Kautz, Jan, Chang, Yan, Iqbal, Umar, Fan, Linxi "Jim", Zhu, Yuke

arXiv.org Artificial IntelligenceDec-8-2025

Despite the rise of billion-parameter foundation models trained across thousands of GPUs, similar scaling gains have not been shown for humanoid control. Current neural controllers for humanoids remain modest in size, target a limited set of behaviors, and are trained on a handful of GPUs over several days. We show that scaling up model capacity, data, and compute yields a generalist humanoid controller capable of creating natural and robust whole-body movements. Specifically, we posit motion tracking as a natural and scalable task for humanoid control, leveraging dense supervision from diverse motion-capture data to acquire human motion priors without manual reward engineering. We build a foundation model for motion tracking by scaling along three axes: network size (from 1.2M to 42M parameters), dataset volume (over 100M frames, 700 hours of high-quality motion data), and compute (9k GPU hours). Beyond demonstrating the benefits of scale, we show the practical utility of our model through two mechanisms: (1) a real-time universal kinematic planner that bridges motion tracking to downstream task execution, enabling natural and interactive control, and (2) a unified token space that supports various motion input interfaces, such as VR teleoperation devices, human videos, and vision-language-action (VLA) models, all using the same policy. Scaling motion tracking exhibits favorable properties: performance improves steadily with increased compute and data diversity, and learned representations generalize to unseen motions, establishing motion tracking at scale as a practical foundation for humanoid control.

artificial intelligence, arxiv preprint arxiv, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2511.0782

Country: Asia (0.28)

Genre: Research Report (0.41)

Technology:

Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Behavior Foundation Model for Humanoid Robots

Zeng, Weishuai, Lu, Shunlin, Yin, Kangning, Niu, Xiaojie, Dai, Minyue, Wang, Jingbo, Pang, Jiangmiao

arXiv.org Artificial IntelligenceSep-18-2025

Whole-body control (WBC) of humanoid robots has witnessed remarkable progress in skill versatility, enabling a wide range of applications such as locomotion, teleoperation, and motion tracking. Despite these achievements, existing WBC frameworks remain largely task-specific, relying heavily on labor-intensive reward engineering and demonstrating limited generalization across tasks and skills. These limitations hinder their response to arbitrary control modes and restrict their deployment in complex, real-world scenarios. To address these challenges, we revisit existing WBC systems and identify a shared objective across diverse tasks: the generation of appropriate behaviors that guide the robot toward desired goal states. Building on this insight, we propose the Behavior Foundation Model (BFM), a generative model pretrained on large-scale behavioral datasets to capture broad, reusable behavioral knowledge for humanoid robots. BFM integrates a masked online distillation framework with a Conditional Variational Autoencoder (CVAE) to model behavioral distributions, thereby enabling flexible operation across diverse control modes and efficient acquisition of novel behaviors without retraining from scratch. Extensive experiments in both simulation and on a physical humanoid platform demonstrate that BFM generalizes robustly across diverse WBC tasks while rapidly adapting to new behaviors. These results establish BFM as a promising step toward a foundation model for general-purpose humanoid control.

artificial intelligence, bfm, control mode, (12 more...)

arXiv.org Artificial Intelligence

2509.1378

Country: Asia > China (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.83)

Add feedback

Emergent Active Perception and Dexterity of Simulated Humanoids from Visual Reinforcement Learning

Luo, Zhengyi, Tessler, Chen, Lin, Toru, Yuan, Ye, He, Tairan, Xiao, Wenli, Guo, Yunrong, Chechik, Gal, Kitani, Kris, Fan, Linxi, Zhu, Yuke

arXiv.org Artificial IntelligenceMay-20-2025

Human behavior is fundamentally shaped by visual perception -- our ability to interact with the world depends on actively gathering relevant information and adapting our movements accordingly. Behaviors like searching for objects, reaching, and hand-eye coordination naturally emerge from the structure of our sensory system. Inspired by these principles, we introduce Perceptive Dexterous Control (PDC), a framework for vision-driven dexterous whole-body control with simulated humanoids. PDC operates solely on egocentric vision for task specification, enabling object search, target placement, and skill selection through visual cues, without relying on privileged state information ( e.g., 3D object positions and geometries). This perception-as-interface paradigm enables learning a single policy to perform multiple household tasks, including reaching, grasping, placing, and articulated object manipulation. W e also show that training from scratch with reinforcement learning can produce emergent behaviors such as active search. These results demonstrate how vision-driven control and complex tasks induce human-like behaviors and can serve as the key ingredients in closing the perception-action loop for animation, robotics, and embodied AI.

arxiv preprint arxiv, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2505.12278

Country: North America (0.28)

Genre: Research Report > New Finding (0.48)

Industry:

Education (0.68)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.68)

Add feedback

Humanoid-VLA: Towards Universal Humanoid Control with Visual Integration

Ding, Pengxiang, Ma, Jianfei, Tong, Xinyang, Zou, Binghong, Luo, Xinxin, Fan, Yiguo, Wang, Ting, Lu, Hongchao, Mo, Panzhong, Liu, Jinxin, Wang, Yuefan, Zhou, Huaicheng, Feng, Wenshuo, Liu, Jiacheng, Huang, Siteng, Wang, Donglin

arXiv.org Artificial IntelligenceFeb-21-2025

This paper addresses the limitations of current humanoid robot control frameworks, which primarily rely on reactive mechanisms and lack autonomous interaction capabilities due to data scarcity. We propose Humanoid-VLA, a novel framework that integrates language understanding, egocentric scene perception, and motion control, enabling universal humanoid control. Humanoid-VLA begins with language-motion pre-alignment using non-egocentric human motion datasets paired with textual descriptions, allowing the model to learn universal motion patterns and action semantics. We then incorporate egocentric visual context through a parameter efficient video-conditioned fine-tuning, enabling context-aware motion generation. Furthermore, we introduce a self-supervised data augmentation strategy that automatically generates pseudoannotations directly derived from motion data. This process converts raw motion sequences into informative question-answer pairs, facilitating the effective use of large-scale unlabeled video data. Built upon whole-body control architectures, extensive experiments show that Humanoid-VLA achieves object interaction and environment exploration tasks with enhanced contextual awareness, demonstrating a more human-like capacity for adaptive and intelligent engagement.

arxiv preprint arxiv, dataset, motion data, (12 more...)

arXiv.org Artificial Intelligence

2502.14795

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)

Add feedback

HOVER: Versatile Neural Whole-Body Controller for Humanoid Robots

He, Tairan, Xiao, Wenli, Lin, Toru, Luo, Zhengyi, Xu, Zhenjia, Jiang, Zhenyu, Kautz, Jan, Liu, Changliu, Shi, Guanya, Wang, Xiaolong, Fan, Linxi, Zhu, Yuke

arXiv.org Artificial IntelligenceOct-28-2024

Humanoid whole-body control requires adapting to diverse tasks such as navigation, loco-manipulation, and tabletop manipulation, each demanding a different mode of control. For example, navigation relies on root velocity tracking, while tabletop manipulation prioritizes upper-body joint angle tracking. Existing approaches typically train individual policies tailored to a specific command space, limiting their transferability across modes. We present the key insight that full-body kinematic motion imitation can serve as a common abstraction for all these tasks and provide general-purpose motor skills for learning multiple modes of whole-body control. Building on this, we propose HOVER (Humanoid Versatile Controller), a multi-mode policy distillation framework that consolidates diverse control modes into a unified policy. HOVER enables seamless transitions between control modes while preserving the distinct advantages of each, offering a robust and scalable solution for humanoid control across a wide range of modes. By eliminating the need for policy retraining for each control mode, our approach improves efficiency and flexibility for future humanoid applications.

artificial intelligence, hover, human computer interaction, (16 more...)

arXiv.org Artificial Intelligence

2410.21229

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report (0.64)

Industry: Education (0.68)

Technology:

Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (0.46)
Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.42)

Add feedback

Hierarchical World Models as Visual Whole-Body Humanoid Controllers

Hansen, Nicklas, S, Jyothir V, Sobal, Vlad, LeCun, Yann, Wang, Xiaolong, Su, Hao

arXiv.org Artificial IntelligenceMay-31-2024

Whole-body control for humanoids is challenging due to the high-dimensional nature of the problem, coupled with the inherent instability of a bipedal morphology. Learning from visual observations further exacerbates this difficulty. In this work, we explore highly data-driven approaches to visual whole-body humanoid control based on reinforcement learning, without any simplifying assumptions, reward design, or skill primitives. Specifically, we propose a hierarchical world model in which a high-level agent generates commands based on visual observations for a low-level agent to execute, both of which are trained with rewards. Our approach produces highly performant control policies in 8 tasks with a simulated 56-DoF humanoid, while synthesizing motions that are broadly preferred by humans.

agent, humanoid control, world model, (13 more...)

arXiv.org Artificial Intelligence

2405.18418

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Asia > South Korea > Daegu > Daegu (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

H-GAP: Humanoid Control with a Generalist Planner

Jiang, Zhengyao, Xu, Yingchen, Wagener, Nolan, Luo, Yicheng, Janner, Michael, Grefenstette, Edward, Rocktäschel, Tim, Tian, Yuandong

arXiv.org Artificial IntelligenceDec-5-2023

Humanoid control is an important research challenge offering avenues for integration into human-centric infrastructures and enabling physics-driven humanoid animations. The daunting challenges in this field stem from the difficulty of optimizing in high-dimensional action spaces and the instability introduced by the bipedal morphology of humanoids. However, the extensive collection of human motion-captured data and the derived datasets of humanoid trajectories, such as MoCapAct, paves the way to tackle these challenges. In this context, we present Humanoid Generalist Autoencoding Planner (H-GAP), a state-action trajectory generative model trained on humanoid trajectories derived from human motion-captured data, capable of adeptly handling downstream control tasks with Model Predictive Control (MPC). For 56 degrees of freedom humanoid, we empirically demonstrate that H-GAP learns to represent and generate a wide range of motor behaviours. Further, without any learning from online interactions, it can also flexibly transfer these behaviors to solve novel downstream control tasks via planning. Notably, H-GAP excels established MPC baselines that have access to the ground truth dynamics model, and is superior or comparable to offline RL methods trained for individual tasks. Finally, we do a series of empirical studies on the scaling properties of H-GAP, showing the potential for performance gains via additional data but not computing. Code and videos are available at https://ycxuyingchen.github.io/hgap/.

machine learning, natural language, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2312.02682

Country:

Asia (0.28)
North America > United States > Maryland (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Energy > Oil & Gas (0.76)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback